Ortholog genes from cactophilic Drosophila provide insight into human adaptation to hallucinogenic cacti

Cultural transformations of lifestyles and dietary practices have been key drivers of human evolution. However, while most of the evidence of genomic adaptations is related to the hunter-gatherer transition to agricultural societies, little is known on the influence of other major cultural manifestations. Shamanism is considered the oldest religion that predominated throughout most of human prehistory and still prevails in many indigenous populations. Several lines of evidence from ethno-archeological studies have demonstrated the continuity and importance of psychoactive plants in South American cultures. However, despite the well-known importance of secondary metabolites in human health, little is known about its role in the evolution of ethnic differences. Herein, we identified candidate genes of adaptation to hallucinogenic cactus in Native Andean populations with a long history of shamanic practices. We used genome-wide expression data from the cactophilic fly Drosophila buzzatii exposed to a hallucinogenic columnar cactus, also consumed by humans, to identify ortholog genes exhibiting adaptive footprints of alkaloid tolerance. Genomic analyses in human populations revealed a suite of ortholog genes evolving under recent positive selection in indigenous populations of the Central Andes. Our results provide evidence of selection in genetic variants related to alkaloids toxicity, xenobiotic metabolism, and neuronal plasticity in Aymara and Quechua populations, suggesting a possible process of gene-culture coevolution driven by religious practices.

The dispersal of human populations out of Africa almost 100,000 years ago has been accompanied by the colonization of almost every terrestrial habitat, resulting in conspicuous ethnic differences across regions 1 . Populations responded both culturally and genetically to the specific environmental conditions, resulting in further dramatic changes with the advent of agricultural and horticultural societies ~ 8,500 years ago 2 . One of the best pieces of evidence is the ability to digest novel foods such as lactose of dairy cattle 3 and carbohydrates from crops 4 , or to tolerate potentially toxic substances like alcohol 5 , salt 6 and arsenic 7 . Despite these examples illustrating how lifestyles and dietary factors can shape global patterns of genetic variation in human populations, the role of ancient traditions has received little attention.
Candidate genes for alkaloid response. To identify the genes implicated in the adaptation to the columnar cactus and especially its alkaloid fraction, we assessed the differential genomic expression of D. buzzatii when reared in T. terscheckii (compared to its prevalent resource of prickly pears), and in higher doses of alkaloids. i.e., T. terscheckii vs O. sulphurea (DEG I); and T. terscheckii 2 × alkaloids vs T. terscheckii (DEG II). We found a total of 127 protein-coding genes differentially expressed across both comparisons (Table S2). Out of the total number of differentially expressed genes, we used the homologs of D. melanogaster (showing informative annotations) for downstream analyses. Specifically, 23 genes were over-expressed and 58 genes were underexpressed in comparison DEG I, while 33 genes were over-expressed and only 3 genes were under-expressed in the comparison DEG II (Adh, Cyp6a2, Cyp6d5, Cyt-b5-r, and Cyp309a* were consistently over-expressed in both comparisons; Fig. 3A; Table S3). To identify the physiological targets of the consumption of the hallucinogenic cactus, we tested whether there is a functional enrichment for particular canonical pathways. The combination of gene sets exhibiting over-expression levels in treatments with comparatively higher concentrations of phenethylamine alkaloids (i.e., T. terscheckii for comparison DEG I; and T. terscheckii 2 × alkaloids for comparison DEG II) were enriched in Gene Ontology (GO) terms mostly related to the detoxification metabolism (Table S4). Major biological pathways were related to the first step of xenobiotic reactions such as oxidation and functionalization of foreign compounds (Aldh, Cyp309A1, Cyp6D5, Jheh3, Cyp6A2, Cyp309A2), neuronal processes such as neurotransmitter clearance and serotonergic mechanisms (Aldh), or related to general metabolism (Baz, Ho, EF2) ( Fig. 3B; Tables S3 and S5). On the other hand, genes under-expressed in the same comparisons were enriched in GO terms related to general metabolism and developmental processes. Enriched biological pathways were mainly related to central processes of cells, such as energy metabolism (Eno, Pepck, Ald, CG10924, Desat2, Impl3, Men) and ATP synthesis (ATPsynC) ( Fig. 3C; Table S6). Using our list of differentially expressed genes in flies reared in the columnar cactus and its alkaloid fraction (DEG I and DEG II), we searched orthologs sequences in the Human genome. Our analysis revealed a total of 70 genes exhibiting moderate to high orthology confidence with Homo sapiens (Fig. 3D). The GO terms found in the over-expressed genes in treatments containing comparatively higher doses of alkaloids involved several important processes such as the regulation of neurotransmitters (ATF4, ASIC1), nervous system development (ATF4, ATP2A1, PARD3, LSAMP, DSCAM, TENM3, EEF2, CTSV, CTSF), oxidative stress (ALDH2, HPGD,  DHRS11, PHYHD1, HMOX2, PRDX1), exogenous drug catabolic processes, (CYP4B1, EPHX1, GSTT1, GSTT2B,  TBXAS1), alkaloid detoxification (CYP3 and CYP4 family genes), general metabolism (FADS1, HPGD) and response to narcotics (ALDH2, ASIC1) (Tables S5 and S7). Important processes of under-expressed genes were also associated to the regulation of neurotransmitters, including catecholamines (ACTB, PEBP1, DBI), nervous

Positive selection footprints in human populations.
To investigate potential signatures of recent selection for alkaloids tolerance in human populations, we analyzed the extent of genetic differentiation and haplotype homozygosity in Native communities of the Central Andes exhibiting a long history of consumption of hallucinogenic cacti. For this, we used two genomic datasets including our target groups (within the area of influence of Andean shamanic practices) and reference populations (exhibiting native genetic background but outside the Central Andean region). Data-set 1 included the Aymara and Quechua as target populations and Yukpa and Bari as reference populations, while our Data-set 2 consisted of Aymara and Quechua as target groups, and Wichi and Yanesha as references (Fig. 1). Our principal component analysis showed that the retained individuals according to their genetic ancestry differentiated into five distinct genetic clusters corresponding to the Central Andean population (Aymara and Quechua), two North Andean/Caribbean populations (Baris and Yukpas-reference populations for Data Set 1), one Gran Chaco and one western Amazonia populations (Wichi and Yanesha-reference populations for Data Set 2; Table S9; Figures S4-S9). Noticeably, all individuals from the Central Andes composed a condensed genetic group independently of the sampling scheme or dataset ( Figure S9). Our estimated genome-wide fixation index revealed that genetic differentiation with respect to central Andean groups (Aymara and Quechua) was larger for the reference communities Bari and Yukpa from the northern Andean region (Fst ~ 0.08-0.17), followed by the Wichi population from the Gran Chaco (Fst ~ 0.08-0.1) and to a lesser extent with the Amazonian Yanesha (Fst ~ 0.04-0.07), which is largely consistent with their geographic distributions ( Fig. 1; Figure S8). To assess whether our catalog of ortholog genes has been a target of recent positive selection in the Central Andean population, we calculated for each variant the integrated Homozigosity Score (|iHS|) for Extended Haplotype Homozigosity, and Likelihood Ratio Test (LRT) score for genetic differentiation (using Han Chinese as the general reference population and one reference population of South America: Yukpa or Bari for Data Set 1 and Yanesha or Wichi for Data Set 2). Then, we combined the indexes into gene-level summary statistics (mean and median) to finally merge the resulting P-values into a Fisher's combination score (Z F ). We identified ten candidate genes showing evidence of evolution under the influence of positive selection in at least one comparison; i.e., genes with identified signal either one (CYP3A43, HPGD, LDHA and TPM1), two (ATP2A1, CTSF and FADS1), three (ALDH2), or four comparisons (COX5A, CYP3A4). Our results were highly consistent when considering either the median or mean values to summarize at the gene level the |iHS| and LRT scores estimated at the variant level, especially when comparing against Bari, Yukpa and Wichi reference populations ( Fig. 4A; Table S10). Finally, we tested whether the proportion of genes showing selection signatures were greater than that expected in the genetic background of our reference populations. We found a significant enrichment of our candidate genes for alkaloid adaptation in most comparisons, except when Yanesha was used as the reference population ( Fig. 4B; Table S11).
Altogether, our results were consistent with selection signatures in our candidate genes related to alkaloids response, for the Central Andean population when compared to the reference ones, except in Yanesha, which showed the lowest evidence of selective sweeps (Fig. 4). This might be related to a combination of genetic and cultural factors, such as historical migratory flows and shamanic influences from the Andean region. In fact, although Yanesha exhibits a predominant Amazonian genetic ancestry, our pairwise Fst test revealed low genetic differentiation respect to Central Andean groups ( Figure S8), while our admixture analysis confirmed a certain degree of shared ancestry with the Andean population 44 ( Figure S5), coherent with its geographic location, adjacent to the eastern slope of the Andes (Fig. 1). This is consistent with recent studies revealing ancient longitudinal gene-flow between north-central Andean populations and Amazonian tribes, as well as with the reported cultural and commercial interactions, including the sharing of practices and trade of psychotropic plants 45 . Thus, although Yanesha currently exhibits a predominant Amazonian background, it is possible that ancient connectivity with major Andean civilizations might have influenced their patterns of adaptive genetic structure.

Discussion
Our in vivo study of transcriptomic responses in the genetic model of cactophilic Drosophila allowed us to identify a suite of candidate genes for the consumption of hallucinogenic cactus that are conserved across distant taxa. We found ten human ortholog genes showing evidence of positive selection in Native Andean populations with a long history of shamanic practices using cactus preparations. Our analyses indicated that the ontology of the selected variants is mostly associated with xenobiotic metabolism, chemical toxicity and neuronal processes, supporting the idea that these regions have been targets of the selection pressure imposed by cactus alkaloids. Overall, our findings suggest that although dietary adaptation has likely shaped the genetic diversity related to plant toxins, local shamanic practices were also possibly important contributors to the recent evolution of ethnic differences. Our results showed that the variants under selection in the Central Andean population were strikingly coincident with the expected effects of cactus alkaloids. For instance, COX5 genes encoding for cytochrome c oxidase play a vital role in the mitochondrial redox system and has been shown to be involved in the stress response in rat brains exposed to morphine, the main alkaloid of opium 46 , while CTSF is a widely expressed lysosomal cysteine protease implicated in the processing and degradation of many essential neuronal proteins, and thereby related with possible neuroprotective effects 47 . Our detection of ALDH2 is interesting as this gene is not only associated with alcohol dependence in humans, but is also an important regulator of dopaminergic and serotonergic systems implicated in protective effects against opioids addiction 48  Only candidate genes with at least one significant comparison (i.e., P-value associated to Z F gene-level positive selection score < 0.05) are shown. Squares and circles denote comparisons when median and mean are used as gene-level summary statistics to estimate Z F , respectively. (B) Enrichment of signals of selection in the Central Andean population for the set of candidate genes. Left and right halves of violin plots represent the expected distribution of the proportion of background genes exhibiting signals of selection across the genome (estimated from 1,000 control gene sets) when mean and median values were used as summary statistics to estimate Z F , respectively. The proportion of selective signals in our set of candidate genes is represented by squares and circles when median and mean are used as gene-level summary statistics to estimate Z F , respectively. We considered as a signal of selection for a gene when the P-value associated to its Z F score is < 0.05, when using each reference population separately (colored violin plots) or when at least one, two or three P-values associated to Z F scores (using different reference populations) are < 0.05 (gray violin plots) . The significance of the enrichment analysis (i.e., higher proportion of signals of positive selection in the candidate gene set respect to the control gene sets) is shown at the top: †P < 0.1; *P < 0.05; **P < 0.01.  49,50 . The expression of ATP2A1 has been related to the induced cardiotoxicity of alkaloids, suggesting an important role in maintaining calcium homeostasis during cardiac arrhythmia 51,52 , while CYP3A genes, members of the P-450 family gene, encode for one of the most important CYP isoforms (CYP3A4, CYP3A5, CYP3A7 and CYP3A43) responsible of alkaloids detoxification and clearance of psychotropic medication in humans 53,54 . These findings are in line with our detection of several psychotropic and toxic alkaloids in T. terscheckii. Mescaline, for example, the major hallucinogenic alkaloid of the genus Trichocereus, and 2-phenylethylamine, a precursor of mescaline with mild psychotropic properties 55 , have both been associated with lethal and teratogenic effects in rodent embryos 26,27,56 . Although little is known on the action of the mescaline derivative alkaloids (N-methlymescaline, trichocereine and N-acetylmescaline), early studies suggested no or mild psychoactive effects 57 , whereas tyramine, N-methyltyramine, hordenine and dopamine are likely precursors of these alkaloids, displaying neuromodulatory properties with possible adverse effects 25,58 . In particular, tyramine was reported to exert oxidative changes in the brain of rats similar to that of narcotics 59 , while excessive intake in humans results in a toxicological response known as the "cheese reaction", leading to hypertension, migraines, neurological problems and respiratory disorders that can induce heart failure and brain hemorrhage 60 . Further, hordenine and 3,4-dimethoxyphenethylamine exhibit inhibitory effects on monoamine oxidases responsible for alkaloids degradation, potentiating the physiological effects of cactus alkaloids 25,60-62 . Moreover, the preparation of San Pedro is usually performed by boiling the plant for several hours to potentiate its effects 9 , implying that even minority alkaloids could accumulate in significant quantities, influencing both toxic and psychotropic properties 50 . Indeed, although the hallucinogenic experience is the most relevant attribute of San Pedro cacti, other common physiological effects include respiratory failure, diffuse anxiety, motor dysfunction, partial or total cardiac arrest, or even death 9,50 . Further, regular use of hallucinogenic plants by children, pregnant, and breastfeeding women has been widely reported in Native American populations, raising important health concerns 29,30 . For instance, medical investigations reported significant associations between drug abuse and maternal and fetal morbidity 63 , while recent family studies suggest an important inherited predisposition driving the variability to drug response and addiction 32,33 . Taken together, our results suggest that the detection of subtle but significant shifts in allele frequencies of genes implicated in alkaloids metabolism could likely be related to their fitness value in the Central Andean region where humans have been consuming San Pedro cacti for millennia. It has been argued that genetic differentiation between ethnic groups is largely related to the ingestion of substances and their detoxification [64][65][66] . In fact, dietary adaptation has been a major contributor to human evolution, especially important during the transition from generalized plant diets of early hominids to the increased meat consumption and mastery of fire in Homo, leading to reduced ingestion of plant toxins 67,68 . Later cultural transitions to intensive agriculture, horticulture and animal domestication starting ~ 10,000 years ago resulted in further changes of the selective pressures on the human genome. Some common examples of modern instances representing gene-culture coevolution include lactase persistence during adulthood 3 , tolerance to alcoholic fermentation 5  www.nature.com/scientificreports/ derived from our early hominid diet may have provided the basis for rapid selective responses in shamanic societies with a deep history of consumption of plant secondary metabolites. Our findings are consistent with previous population genomic analyses combining ecological information, which found subtle shifts in allele frequencies of genes related to diets rich in roots and tubers in boreal human populations, suggesting that several SNPs may be involved in the detoxification of plant secondary metabolites 65 . Indeed, an increasing number of studies are discovering ethnic distributions of selected detoxification genes 69,70 , likely associated to novel toxins derived from modern lifestyles 66,69,71 , and recent advances in the field of pharmacogenomics have revealed important variation across populations in SNPs associated with opioid dose variability 32 . Our finding of signatures of selective sweeps associated with CYP3A43 and especially in CYP3A4 is in agreement with several studies where signals of positive selection in CYP3A genes were found in African, Asian and European populations, suggesting that this locus is sensitive to natural selection 66 . Moreover, CYP3A43 has been associated with the clearance of neuroleptic drugs (antipsychotics), while CYP3A4 is considered the most important drug metabolizing CYP enzyme in humans (> 50% of all drugs), and a major responsible for the metabolism of opioids and other alkaloids with important clinical implications in drug addiction 54,72,73 . Although little is known on the phenotypic effect of most polymorphisms, previous studies have provided evidence of higher activity of the wild-type CYP3A4*1 than that of most isoforms when exposed to the quinine alkaloid 72 . Thus, the higher metabolic rate associated to the wild-type found in Quechua and Aymara populations could have provided a selective advantage to detoxify cactus alkaloids, contributing to the allele fixation in the Central Andes.

Considerations and caveats. Elucidating the antiquity and extent of the use of psychotropic cacti in the
Andean culture is a difficult task, if not impossible. The lack of writing systems in early Andean societies only allows us to infer the relevance of shamanic traditions through the indirect evidence left in archaeological settlements or in the inertia of the cultural legacy. Notwithstanding, most studies agree that the use of San Pedro seems to have been a critical element of the cultural evolution throughout most of the prehistory of Andean civilizations, even prior to the first expansion associated with the religious power and theocratic rule of Chavın and Cupisnique cultures almost 3,500 years ago 15,[22][23][24] . More than twenty years of archaeological research in northern Peru revealed that the civilization of Caral, the oldest in the Americas (5,000 ya), exhibited one of the largest early urban complexes in the world probably based on communal spiritual exaltation 74 . Detailed studies on the archaeological site have found evidence of cactus trade and numerous shells of the cactophilic snail Scutalus proteus used for concentrating the alkaloids of San Pedro 14 , suggesting the deep and widespread history of cactus use in the region (it is important to note that the consumption of cacti does not require any paraphernalia, and thus it is likely that the predicted range of its use in the past is underestimated). Shamanic traditions extended far in time, to the recent Inca Empire, and even survived the suppression of European colonization, which failed to eradicate the use of San Pedro that still persists today 9,15,50 .
Although it is important to note that the effects of selection and demography are difficult to disentangle, the examination of multiple loci should be informative of these processes because population fluctuations result in similar random effects across loci, while positive selection tends to be locus-specific. Furthermore, our outlier approach and the combination of our two tests considering both genetic differentiation and linkage disequilibrium should be robust to the confounding effects of demography 75,76 . Notwithstanding, our study has several methodological limitations that require discussion. First, we cannot rule out that the signals of selection detected in our analysis could have been driven by adaptive evolution in response to a complex combination of selective pressures that could also be potentially important for population health. For instance, missense of CTSF has been related to the Kufs neuronal disease 47 while variants of COX5A result in lactic acidemia, pulmonary arterial hypertension and failure to thrive 77 . Some mutations in ATP2A1 can result in a rare autosomal recessive myopathy (Brody disease), characterized by exercise-induced muscle stiffness 78 , whereas variations in FADS1 have been associated with ethnic differences and medical phenotypes such as metabolic syndrome, abnormal lipid metabolism, and attention deficit disorder/hyperactivity 79 . In particular, CYP3A is not only a critical genetic contributor to drug clearance, but also has important implications in the endogenous metabolism of several hormones such as estrogens, cortisol and corticosterone associated with cancer risk and sodium retention related to hypertension and pregnancy complications 80 . The diversity of exogenous and endogenous substrates acting on CYP3A suggests that multiple and even antagonist selective pressures have likely shaped pleiotropic functions and potential trade-offs, resulting in complex genotype-phenotype relationships. Thus, further studies on the endogenous-exogenous metabolism interaction are needed to better understand their relative importance for phenotypic associations and ethnic considerations. Secondly, despite recent studies in Drosophila are significantly increasing our overall understanding of the genetic basis to the susceptibility to psychotropic drugs, such as cocaine, morphine, amphetamine and methamphetamine 34,81,82 , it is worth to note that several genes are likely to be overlooked, partly due to the divergent functionalities and physiological differences between flies and humans. Third, we acknowledge that many human orthologs of our candidate genes might not be shared across our studied datasets, resulting in the underestimation of the number of genes. Indeed, recent comparative studies of flies and humans recognize that the overall low number of reported genes associated to drug response, is likely due to the current low representation of human studies 33 . This is especially critical in our case, as the reduced genotyping effort of South American Native communities implied a limited sampling for both our target and reference populations 83,84 . Thus, a larger collection of whole-genome sequences is necessary to supply larger sample sizes and SNP density to capture a larger number of genes and discriminate whether positive selection preferentially acted on protein-coding or regulatory regions.

Conclusions
Although more efforts are certainly needed to test our hypotheses, this study provides a first step in addressing the complex interplay between cultural and genetic co-evolution in the Central Andes, that would otherwise be extremely difficult to investigate. Our identification of candidate genes for alkaloid tolerance in Andean populations not only contributes to a better understanding of how ancient practices may have contributed to recent human evolution, but also provides insight on the genetic basis of ethnic differences in many disease risks. Several genetic variants related to alkaloid responses are also implicated in addictions, neurological disorders, and the metabolism of many important drugs. Therefore, the identification of ancient adaptive footprints to natural drugs across different ecosystems is fundamental for human health and especially important for the development of personalized genomic medicine. A larger representation of South American genomes combined with functional validation studies will be key to demonstrate the putative role of shamanism in the recent evolution of the human genome.

Materials and methods
Toxicological experiment. The experimental set up has been previously described in detail 40 . Briefly, we collected individuals of D. buzzattii from a wild population of Northwestern Argentina to generate nearly isogenic lines of three conspicuous genotypes (homozygous for the second chromosome inversions: standard, j and jz3). We also collected fresh pieces of their hosts present in the area: the giant columnar cactus Trichocereus terscheckii and the prickly pear cactus Opuntia sulphurea. To assess the effect of the columnar cactus and the hallucinogenic alkaloids on the genomic expression of D. buzzattii, we exposed 1 st instar larvae to each cactus, including the addition of the alkaloid fraction of T. terscheckii (2 × the native concentration . We extracted total RNA from 10 batches of 3 rd instar larvae for each genotype (biological replicates), using a TRIzol/ RNeasy protocol specific to Drosophila. We used the Illumina paired-end library (insert size: 150-450 bp) and sequenced in a HiSeq 2000 platform with 101-cycle reads, obtaining a mean of 16 Gbp of raw reads per genotype and treatment (NCBI Accession: PRJNA314520).

Alkaloids identification.
To thoroughly characterize the alkaloid fraction of T. terscheckii, we collected fresh samples from five individuals at different times of the year to account for temporal variation in alkaloid concentration. The plant material was collected in the northwest of Argentina, in the Valle Fertil Natural Park (public land of the province of San Juan) and the identification of the cactus was performed by Alejandro Saint-Esteven et al 85 . We isolated alkaloids from plant tissues through the alkaline-CH 2 Cl 2 extraction method 16,40 . The chemical profile of the extracted fraction was accomplished by GC-MS (Thermo Scientific EM/DSQ II-Trace GC Ultra AI3000). Alkaloids identification was performed by comparing the retention times and mass spectra with reference standards and database spectra (UMYMFOR and NIST). The relative variation and abundance of each alkaloid was quantified based on the integration of the peaks area across the gas chromatograms. We analyzed five GC quality standards (> 98%; Sigma Aldrich) of the most representative alkaloids present in the genus Trichocereus 21 : Tyramine; Hordenine; 3,5-dimethoxy-4-hydroxyphenethylamine; 3,4-dimethoxyphenethylamine; 3-methoxytyramine. Given the occurrence of dopamine as a major biosynthetic precursor of mescaline and other substituted phenethylamines in giant columnar cacti 21 , we performed an acid extraction to pH 3 with 0.1 M HCl solution. Dopamine identification was performed by comparing the retention time and mass spectra of the acidic extract and a dopamine reference standard (> 99%; Sigma Aldrich) through HPLC-tandem Mass Spectrometry (Waters Quattro Premier XE spectrometer). To identify mescaline and trichocereine (N,N-dimethylmescaline) alkaloids, we used their solubility differences, extracting with chloroform and ether, respectively 86

Differential expression and gene orthology.
To evaluate the effects of the columnar cactus and its alkaloid fraction on D. buzzatii, we analyzed differential gene expression between flies reared in: T. terscheckii vs O. sulphurea (DEG I); and T. terscheckii 2 × alkaloids vs T. terscheckii (DEG II). Given that Native communities do not isolate the alkaloids, but rather concentrate all the characteristics of the plant, our first comparison (DEG I) allowed us to account for the differential expression caused by the columnar cactus (overall plant effect-with respect to the use of prickly pears), while our second comparison (DEG II) allowed us to elucidate the specific effects of high doses of hallucinogenic alkaloids (alkaloids effect). For the search human orthologs, we combined the significantly differentially expressed genes of both comparisons to contemplate the distinctive effect of the columnar cactus and especially its alkaloid fraction. Raw reads of our nine transcriptomes (three genotypes per treatment) were quality controlled using FASTQC v.0.10.1 87 and filtered for quality scores ≥ 25 and minimum lengths ≥ 25 bp, resulting in a mean of 12 Gbp per genotype in each treatment. We used the program RSEM v1.2.30 88 to estimate gene expression levels in fragments per kilobase of transcript per million mapped reads of protein-coding genes using the reference genome of D. buzzatii 39 . To analyze the differential gene expression, we used the NOISeqBIO method of the R package NOISeq v2.18.0 89 with a false discovery rate (FDR) < 0.01, and applied a posteriori filter using a custom script to further reduce possible spurious "noise". The statistical strategy www.nature.com/scientificreports/ of NOISeq considers the differences in both the mean expression level and in the order of magnitude to measure changes in gene expression between conditions, and thus identify significantly differentially expressed genes. We searched human ortholog genes from our filtered gene list through the DRSC Integrative Ortholog Prediction Tool (DIOPT) v6.0.1 90 which facilitates the identification of orthologs by calculating a score for the support of a given orthologous gene-pair relationship, as well as by a weighted score based on the functional assessment of molecular function annotation of all fly-human orthologous pairs predicted by each tool and algorithm. We retained the genes with moderate (best score in the forward or reverse search and DIOPT score > = 2 or DIOPT score > = 4) and high confidence rank (best score in both forward/reverse search and DIOPT score > = 2). Because this tool uses D. melanogaster genes as input, we first performed a cross-species analysis to identify homologous genes with D. buzzatii and carried out a detailed exploration of the functional annotation using g:Profiler 91 . To interpret the biological information in the context of Homo sapiens (signaling, metabolic molecules and their pathways and processes), we analyzed our ortholog genes in the Reactome Knowledgebase 92 .
Human genomic data. We aimed at evaluating genomic signals of positive selection in human populations from Central Andes inhabiting the historical area of influence of shamanic practices employing columnar cacti. For this, we leveraged two sets of published single nucleotide polymorphisms (SNPs) data (Table S9). Data-set 1 consists of two focal populations from Central Andes (Aymara and Quechua from Peru and Bolivia, respectively 93 ) and two reference populations from Northern Andes/Caribbean (Yukpa and Bari from Venezuela 94 ). Data-set 2 consists of three focal ethnic groups of the Central Andes (Aymara, Quechua and Uro from Peru and/or Bolivia), and two reference populations from the Gran Chaco in Argentina (Wichi), and western Amazonia (Yanesha) in Peru 44 . While Aymarans, Quechuans and Uros represented our target groups, the reference populations represented our "control" groups (i.e., Native genetic background, but outside the central Andean region) as they constitute putatively culturally divergent and relatively isolated populations 95 outside the distribution range of Trichocereus species (Fig. 1). We filtered our data to only retain individuals exhibiting a high degree of Native American ancestry. For this, we also considered Nahuan and Mayan populations from Mexico (Data-set 1) and Ashaninka, Cashibo, Huambisa, Shipibo from Peru, and Tzotzil from Mexico (Data-set 2). We removed SNPs and individuals exhibiting > 2% and > 5% of missing genotypes, respectively, and SNPs with Minor Allele Frequency (MAF) < 1%. We also removed second-degree relatives and given the potential admixture of our populations with non-Native individuals, we performed admixture analyses at the global level, including 405 African, 503 European and 347 American individuals from the Phase3 of the 1000 Genomes Project 96 . Our global admixture analysis was performed with a prior number of putative ancestral populations of K = 3 -10 (10 independent runs), resulting in a best model of K = 8 for both data sets ( Figure S4). We removed individuals exhibiting < 95% of Native American specific genetic ancestry ( Figure S5). To ensure sufficient genetic differentiation among our reference and target populations (likely reduced by historical migrations), we performed an additional local fine-scale analysis of genetic structure using FineStructure v4 97 , after phasing the data using the 1000 Genomes haplotypes as reference 96 . This analysis revealed the genetic clustering of all Central Andean individuals (both data sets), Bari, Yukpa (Data-set 1), Yanesha and Wichi (Data-set 2) individuals ( Figure S6). We further validated the genetic differentiation among groups through genome-wide pairwise F ST index and Principal Component Analysis using all retained individuals from both data sets. Both approaches showed that all Central Andean individuals grouped together, with the exception of four Uro individuals ( Figures S7 and S8) that were removed to obtain a genetically homogenous group. Thus, we selected a subset of individuals in order to avoid non-Native American ancestry while maximizing the genetic differentiation among our reference populations (Bari, Yukpa, Yanesha and Wichi) and with respect to the Central Andean group ( Figure S9). Altogether, our use of different data sets represent independent replicates for detecting potential signals of selection related to the use of hallucinogenic cactus in the Central Andes (see details in Supplementary Information).

Positive selection analyses.
For the analyses of positive selection, we employed two alternative approaches: the degree of genetic differentiation and the extended haplotype homozygosity. For the genetic differentiation test, we used the method implemented in the TreeSelect software 98 to contrast whether allele frequencies in any of the evaluated populations are significantly differentiated from the putative ancestral genetic background. We performed the TreeSelect test in the Central Andean population, using the reference population of Han Chinese 96 in all cases, and one reference population of South America (Yukpa or Bari for Data-set 1, and Yanesha or Wichi for Data-set 2; Table S9). Thus, in each data set we obtained two Log Ratio Test (LRT) scores for each SNP. For the extended haplotype homozygosity test, we calculated iHS scores for each SNP of our target population using the rehh package of the R software 99 . Datasets were filtered by MAF < 0.01 and missing genotype > 2%. We considered the mean and the median of |iHS| or LRT scores (at the SNP level) as summary statistics at the protein-coding gene level. In order to exclude possible stochastic signals of non-selective events, such as genetic drift or demographic fluctuations, we implemented a genome scan approach 100 by taking into account the gene-level background for protein coding regions. We computed the median and the mean summary statistics for both |iHS| and LRT scores and estimated gene-level empirical distributions for each genomic background as previously implemented 76 . Empirical P-values were calculated using the gene-level score distributions generated from the genes in the background genome set. Finally, LRT and |iHS| based scores were combined using the Fisher combination test 101 : www.nature.com/scientificreports/ where P i stands for the empirical P-value obtained from the test i. Gene-level formal P-values were finally derived from the X 2 distribution with 4 degrees of freedom 101 . We thus obtained a total of eight Z F scores per gene for both the mean or the median as summary statistics: two for Data Set 1 (with either Yukpa or Bari as the 2nd reference population for the TreeSelect test), and two for Data Set 2 (with either Yanesha or Wichi as the 2nd reference population for the TreeSelect test; see Tables S10a and S10b). We further tested whether our candidate genes have been preferentially targeted by recent positive selection in the Central Andean population by testing whether the proportion of genes with signals of selection was greater than that observed in 1,000 control gene sets. To generate the empirical distributions, we selected 1,000 genes (for each of our candidate genes) exhibiting the most similar recombination rate and number of SNPs analyzed, using the 1000 Genomes genetic map 96 . A given gene was considered to be under positive selection when its associated Z F score was significantly different from 0 with a type I error of 5%, using each of the four South American reference populations separately. In addition, we performed a complementary test to consider evidence of selection when at least one, two and three Z F scores were significant. We generated custom scripts to estimate the P-values by calculating the proportion of the 1,000 control gene sets exhibiting a greater number of signals of selection than observed in our set of candidate genes (see Data availability).

Data availability
Details of data analyses have been uploaded as part of the Supplementary Material. Additionally, the scripts and distributable data used in this study has been made available through the GitHub public repository (https:// github. com/ pierr espc/ cactus_ fly_ human_ natsel_ andes; https:// github. com/ diego mics/ cactus_ fly_ human). www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.